Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present a clear ablation study of post-exploration in a general intrinsically motivated goal exploration process (IMGEP) framework, that the Go-Explore paper did not show. We study the isolated potential of post-exploration, by turning it on and off within the same algorithm under both tabular and deep RL settings on both discrete navigation and continuous control tasks. Experiments on a range of MiniGrid and Mujoco environments show that post-exploration indeed helps IMGEP agents reach more diverse states and boosts their performance. In short, our work suggests that RL researchers should consider to use post-exploration in IMGEP when possible since it is effective, method-agnostic and easy to implement.
translated by 谷歌翻译
Non-parametric episodic memory can be used to quickly latch onto high-reward experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete tables, and this approach has so far only been applied to discrete action space problems. Therefore, this paper introduces Continuous Episodic Control (CEC), a novel non-parametric episodic memory algorithm for sequential decision making in problems with a continuous action space. Results on several sparse-reward continuous control environments show that our proposed method learns faster than state-of-the-art model-free RL and memory-augmented RL algorithms, while maintaining good long-run performance as well. In short, CEC can be a fast approach for learning in continuous control tasks, and a useful addition to parametric RL methods in a hybrid approach as well.
translated by 谷歌翻译
最近的深度加强学习已经收集了很多关注。令人印象深刻的成果是在自动驾驶,游戏播放,分子重组和机器人的多样化活动中实现的。在所有这些领域,计算机程序都教授自己解决困难问题。他们已经学会了飞行模型直升机,并执行循环和卷等特技化工制作。在一些应用中,他们甚至比最好的人类更好,例如Atari,Go,Poker和Starcraft。深度加强学习探讨复杂环境的方式提醒孩子如何通过乐于尝试事物,获得反馈并再次尝试。计算机似乎真正拥有人类学习的方面;这是人工智能梦想的核心。教育工作者的研究成功并未被教育工作者忽视,大学已经开始提供对该主题的课程。本书的目的是提供深度加强学习领域的全面概述。这本书是为人工智能的研究生编写的,以及希望更好地了解深度加强学习方法及其挑战的研究人员和从业者。我们假设对计算机科学和人工智能的理解本科级别;本书的编程语言是Python。我们描述了深度增强学习的基础,算法和应用。我们涵盖了构成该领域基础的既定的无模型和基于模型的方法。发展迅速发展,我们还介绍了先进的主题:深度多功能加固学习,深层等级强化学习和深度元学习。
translated by 谷歌翻译